-
-
Notifications
You must be signed in to change notification settings - Fork 21.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized physics object spawn time, improved bullet physics server code. #42643
Conversation
Do we have any demo projects available that have been ported to the current Maybe we could also have "pathological" physics demo that use lots of bodies and various APIs to stress test the implementation. |
Note: It's not particularly important, but remember that commit logs should be formatted as one title on the first line, then a blank line, and then the body of the commit message. Otherwise all of it is concerned part of the commit title, which leads to this kind of weirdness (see header): https://patch-diff.githubusercontent.com/raw/godotengine/godot/pull/42643.patch |
We have one, but it needs to be ported to the |
Here the test project TestSpawnRB.zip you can just open and launch it. You will see in the console the overall time taken to process 3 frames; you can test the PR and the current master branch to see the difference. |
ad35e42
to
46ce9c3
Compare
Title added. |
Here's a quick and dirty port of the 3D platformer demo to current |
This PR is difficult to review, because it is both complex and trying to do multiple things. I think, to enable it to be reviewed and tested properly and effectively, it needs to be broken up into separate, self-contained commits and probably PRs; perhaps one for each of its deliverables:
|
@madmiraal I'm not totally opposed to splitting, but I don't think it's really necessary. At this point, I would rather review this PR as it is, because asking @AndreaCatania to group his PRs when they are split, then to split them when they are grouped is not going to help with (hopefully) finalizing this optimization soon :) These two are a bit off-topic so they could be possibly separated. They don't add too much to the complexity though, because the first one is just one block and the second one is only in headers.
The others are all about optimizing physics spawn time and they affect different areas of code.
This one is the only really complex change. For test purpose, different parts can be disabled by calling
This change is localized in
This one is only affecting vector accesses, and it's an easy, obvious optimization since it just gets rid of copy-on-write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes generally look ok, I've made a few comments for possible improvements.
I'll try to run my own tests about the performance gain later this week.
modules/bullet/space_bullet.cpp
Outdated
for (uint32_t i = 0; i < queue_pre_flush.size(); i += 1) { | ||
queue_pre_flush[i]->dispatch_callbacks(); | ||
queue_pre_flush[i]->is_in_flush_queue = false; | ||
} | ||
for (uint32_t i = 0; i < queue_flush.size(); i += 1) { | ||
queue_flush[i]->dispatch_callbacks(); | ||
queue_flush[i]->is_in_flush_queue = false; | ||
} | ||
queue_pre_flush.clear(); | ||
queue_flush.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments here to explain what the logic is behind having two different queues could make it clearer for future contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please more comments for future contributors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 awesome thx!
|
||
btCollisionShape *ShapeBullet::create_bt_shape(const Vector3 &p_implicit_scale, real_t p_extra_edge) { | ||
btVector3 s; | ||
G_TO_B(p_implicit_scale, s); | ||
return create_bt_shape(s, p_extra_edge); | ||
} | ||
|
||
btCollisionShape *ShapeBullet::create_bt_shape(const btVector3 &p_implicit_scale, real_t p_extra_edge) { | ||
if (p_extra_edge == 0.0 && (p_implicit_scale - btVector3(1, 1, 1)).length2() <= CMP_EPSILON) { | ||
return default_shape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not creating the default shape here only when needed, to avoid always creating two shapes when there is a scale? Also, maybe with a ref count for the default shape to know when to destroy it, rather than storing the old default shape? That would simplify the logic around notifyShapeChanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch! Integrated.
46ce9c3
to
e179750
Compare
I have problems to do proper performance tests at the moment, because I'm getting crashing asserts from Bullet when running TestSpawnRB project on latest master. It's this assert in
It looks like it's due to something weird happening around ccd. Looking at the callstack, this code shouldn't be run on bodies with default settings. From what I see, ccd seems to be activated on all bodies with this callstack:
Because
I'll open a PR to propose a fix so performance changes from this PR can be properly compared on master. |
…t server. - Optimized physics object spawn time, by integrating lazy body reload. - Optimized shape usage when the shape is not scaled. - Make sure to correctly fetch gravity when the integrate_forces function is used - Added override keyword - Using LocalVector
e179750
to
9cc2a73
Compare
@pouleyKetchoupp I saw that you fixed the CCD issue. I've rebased and the crash seems gone. Though, I didn't had that, so I can't really say it's fixed. Please let me know if you still have the crash (in case you still have that, plese paste the stack trace). |
Never mind, I'm able to reproduce the issue. Thought, it's not affecting this PR any more. |
@pouleyKetchoupp did you try this after the CCD fix? |
@AndreaCatania I've made some quick tests but I still need to do more in-depth performance comparison. I've been busy with other things but I'll finish reviewing it as soon as I can find extra time. |
@pouleyKetchoupp Ok thanks! |
I finally got to do proper performance comparisons, sorry for the delay! Here's the project I'm using (it's part of the physics tests framework, ported to 4.0): This PR is also needed on master to make it work properly: And I can confirm a clear improvement when adding objects (about 2x faster with 125000 objects on my configuration, when counting the extra time spent on the next physics step, since now a part of the process is postponed). On the other hand, it seems removing objects is almost 3x slower than before. I haven't investigated why exactly. @AndreaCatania could you please check that? Here are detailed timings I get with the attached test scene (timings are pretty consistent). Bullet / Current master
Bullet / PR 42643
Godot Physics 3D
|
This would need to be redone for I also agree that the scope of this PR seems to be a bit big and it might be easier to assess and merge some of the changes independently (e.g. switch to LocalVector could likely be merged as is). |
This is a combination of #39726 #40252 #41096 that got reverted here #42639
Test project: TestSpawnRB.zip you can just open an launch it.
The benchmark code is this:
This is flame graph extracted from the perf of this PR:
This is flame graph extracted from the perf of master branch:
Without this PR, the
set_
functions (likeset_layer
/set_mask
) remove and add the body from the Physics World to recompute the entire hierarchy and cache.Godot, call these functions one by one when a new
RigidBody
node is created, so the cache is internally reconstructed multiple times. With this PR, the body is created at the start of the frame and so all the code is executed only 1 time.